Thermodynamics of Lattice Heteropolymers 
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We calculate thermodynamic quantities of HP lattice proteins by means of a multicanonical 
chain growth algorithm that connects the new variants of the Pruned-Enriched Rosenbluth Method 
(nPERM) and fiat histogram sampling of the entire energy space. Since our method directly sim- 
ulates the density of states, we obtain results for thermodynamic quantities of the system for all 
temperatures. In particular, this algorithm enables us to accurately simulate the usually difficult 
accessible low-temperature region. Therefore, it becomes possible to perform detailed analyses of 
the low-temperature transition between ground states and compact globules. 
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I. INTRODUCTION 

The native conformation of a protein is strongly corre- 
lated with the sequence of amino acid residues building- 
up the heteropolymer. The sequence makes the protein 
unique and assigns it a specific function within a bio- 
logical organism. The reason is that the different types 
of amino acids vary in their response to the environment 
and in their mutual interaction. It is a challenging task to 
reveal on what general principles the folding process of a 
protein is based. Models differ extremely in their level of 
abstraction, ranging from simple and purely qualitative 
lattice models to highly sophisticated all-atom off-lattice 
formulations with explicit solvent that partially yield re- 
sults comparable with experimental data. Due to the 
enormous computational effort required for simulations 
of realistic proteins, usually characteristic properties of a 
protein with a given sequence are studied in detail. Much 
simpler, but by no means trivial, lattice models enjoy a 
growing interest, since they allow a more global view on, 
for example, the analysis of the relation between sequence 
and structure. 

In this paper, we shall focus ourselves on thermody- 
namic properties of lattice proteins at all temperatures. 
In particular, this includes the investigation of the tran- 
sitions between the different classes of states: lowest- 
energy states, compact globules, and random coils. Since 
the ground-state-globule transition occurs at rather low 
temperatures, a powerful algorithm is required that in 
particular allows a reasonable sampling of the low-lying 
energy states. To this end we combined multicanonical 
strategies [1-3] with chain growth algorithms [4-7] to a 
new method [8] which works temperature-independent 
and directly simulates the density of states. This quan- 
tity contains all energetic information necessary for com- 
puting the mean energy, free energy, entropy, and specific 
heat for all temperatures. In the following, we present re- 



sults obtained from the application of this method to dif- 
ferent lattice proteins with lengths up to 103 monomers, 
modelled by the simplest lattice formulation for het- 
eropolymers, the HP model [9], In this model, only two 
types of monomers enter, hydrophobic (H) and polar (P) 
residues. The model is based on the assumption that 
the hydrophobic interaction is one of the fundamental 
principles in protein folding. An attractive hydropho- 
bic interaction provides for the formation of a compact 
hydrophobic core that is screened from the aqueous en- 
vironment by a shell of polar residues. Therefore the 
energy function reads 
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where < i — 1) denotes summation over nearest 

lattice neighbours that are nonadjacent along the self- 
avoiding chain of monomers. A hydrophobic monomer 
at the ith position in the chain has Oi — 1 and a polar 
monomer is assigned <Ji = 0. 

The first part of this paper, where we explain how our 
new method works, will be of a more technical nature, 
while the second part is devoted to applications of this 
algorithm to heteropolymers. In Section II we discuss 
the main differences of Monte Carlo methods based on 
move sets for updating conformations and chain growth 
algorithms as well as their peculiarities. This is fol- 
lowed by Section III about the thermodynamic quanti- 
ties that will be estimated with our method. Then, in 
Section IV, we enter into the description of the multi- 
canonical chain growth algorithm. This technical part is 
preluded by recalling the essential ingredients of PERM 
and nPERM?| [4-7] as these are fundamental for setting 
up our algorithm. Then we proceed with the explanation 
of multicanonical chain growth and the determination of 
the multicanonical weight factors. Section V is devoted 
to the validation of our method, and in Section VI we 
present the results obtained with our algorithm. There 
we focus on thermodynamic properties of heteropolymers 
with sequences of more than 40 monomers. Finally, we 
summarise the main aspects of this paper in Section VII. 
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II. MOVE SETS VS. CHAIN GROWTH 

Polymers fold on a lattice into conformations that are 
by definition self-avoiding. This takes into account the 
finite volume and the uniqueness of the monomers. A 
lattice site can hence be occupied by a single monomer 
only. This has the consequence that the number of very 
dense conformations of a polymer is by orders of mag- 
nitude lower than that of random coil states. In Monte 
Carlo simulations, particular attention must therefore be 
devoted to efficient update procedures which also allow 
the sampling of dense conformations. In polymer simu- 
lations, so-called move sets were applied with some suc- 
cess to study the behaviour near the 9 point, which de- 
notes the phase transition, where polymers subject to 
an attractive interaction collapse from random coils to 
compact conformations. Move sets being widely used 
usually consist of transformations that change the posi- 
tion of a single monomer and a single bond vector (end 
flips), a single monomer position but two bonds (cor- 
ner flips), two positions and three bonds (crankshaft) 
or moves with more changes, and pivot rotations, where 
the zth monomer serves as pivot point and one of the 
two partial chains connected with it is rotated about any 
axis through the pivot [10]. For (a-thermal) self-avoiding 
walks the latter method is known to be very efficient [11]. 
It becomes inefficient, however, the more dense the con- 
formation is. At low temperatures, the acceptance rate 
of locally changing a dense conformation decreases dras- 
tically and the simulation threatens to get stuck in a 
specific conformation or to oscillate between two states. 
Since the search for ground states is an essential aspect 
of studying lattice proteins, the application of move sets 
is not very useful, at least for chains of reasonable length. 

A more promising alternative is the completely dif- 
ferent approach based on chain growth. The polymer 
grows by attaching the nth monomer at a randomly cho- 
sen next-neighbour site of the (n — l)th monomer. The 
growth is stopped, if the total length N of the chain 
is reached or the randomly selected continuation of the 
chain is already occupied. In both cases, the next chain 
is started to grow from the first monomer. This sim- 
ple chain growth is also not yet very efficient, since the 
number of discarded chains grows exponentially with the 
chain length. The performance can be improved with the 
Rosenbluth chain growth method [12], where first the free 
next neighbours of the (n — l)th monomer are determined 
and then the new monomer is placed to one of the un- 
occupied sites. Since the probability of each possibility 
for the next monomer to be set varies with the number 
of free neighbours, this implies a bias given by 
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(2) 



where mi is the number of free neighbours to place the 
Ith monomer. The bias is corrected by assigning a Rosen- 
bluth weight factor W R ~ Vn 1 to each chain that has 



been generated by this procedure. Nevertheless, this 
method suffers from attrition too: If all next neighbours 
are occupied, i.e., the chain was running into a "dead 
end" (attrition point), the complete chain has to be dis- 
carded and the growth process has to be started anew. 

Combining the Rosenbluth chain growth method with 
population control, however, as is done in PERM 
(Pruned-Enrichcd Rosenbluth Method) [4, 5], leads to 
a further considerable improvement of the efficiency by 
increasing the number of successfully generated chains. 
This method renders particularly useful for studying the 
point of polymers, since then the Rosenbluth weights 
of the statistically relevant chains approximately cancel 
against their Boltzmann probability. The (a-thermal) 
Rosenbluth weight factor W R is therefore replaced by 
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where T is the temperature and Ei is the energy of the 
partial chain X; = (xi, . . . ,x;) created with Rosenbluth 
chain growth. In PERM, population control works as fol- 
lows. If a chain has reached length n, its weight W RERM 
is calculated and compared with suitably chosen upper 
and lower threshold values, and W^, respectively. 
For W RERM > W> , identical copies are created which 
grow then independently. The weight is equally divided 
among them. If IF^ ERM < W<, the chain is pruned 
with some probability, say 1/2, and in case of survival, 
its weight is doubled. For a value of the weight lying 
between the thresholds, the chain is simply continued 
without enriching or pruning the sample. 

In the recently developed new variants nPERM?| [6], 
the number of copies is not constant and depends on 
the ratio of the weight W RERM compared to the upper 
threshold value W„ and the copies are necessarily cho- 
sen to be different (the method of selecting the copies is 
based on simple sampling (ss) in nPERMss and a kind 
of importance sampling (is) in nPERMis). This proves 
quite useful in producing highly compact polymers and 
therefore these new methods are very powerful in deter- 
mining lowest-energy states of lattice proteins. 



III. DENSITY OF STATES AND 
THERMODYNAMIC QUANTITIES 

In order to investigate the thermodynamic properties 
of lattice proteins accurate simulations are necessary. 
Due to the difficulties with the update of conformations 
at low temperatures and a primary interest in detecting 
lowest-energy conformations, only a few results of ther- 
modynamic quantities are found in the literature. Nev- 
ertheless, the understanding of the conformational tran- 
sitions and the dependence of their sharpness on the se- 
quence is only possible with algorithms that yield good 
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results for low as well as for high temperatures. Conse- 
quently, reasonable results can only be obtained, if the 
method allows the sampling of the entire energy space. 

All energetic statistical quantities of the protein can 
be expressed by means of the density of energetic states 
g(E), since the partition function of a lattice protein with 
given sequence can be written as 
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where (3 = l/T is the inverse of the thermal energy in nat- 
ural units. The sum in the first representation runs over 
all possible realisations of self-avoiding walks on the lat- 
tice, while in the second expression the sum is taken over 
all energetic states a lattice protein can adopt. Then, 
the expectation value of any energetic observable 0{E) 
is simply 



(0(E))(T) = ^J2°( E ^(Ei)e-^, 



(5) 



and the mean energy as the negative logarithmic deriva- 
tive of Z with respect to (3 is given by 



(6) 



With these expressions, the specific heat Cy = d(E)/dT 
obeys the fluctuation formula 



Cy(T) = -L «£*) - (25)2) 



(7) 



Moreover, knowing g(E), the Hclmholtz free energy is 
obtained from 



F(T) = -Tln^giEJexpi-PEi} 



and the entropy can be calculated as 



1 



S(T) = -[(E)(T)-F(T)}. 



(8) 
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In addition, non-energetic structural quantities are of 
interest for discussing the compactness of conformations, 
such as the end-to-end distance 



R cc = \x N - xi| 
and the radius of gyration 
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where Xo = X)z=i X i/N is the centre of mass of the con- 
formation (with all monomers having equal mass). For 
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the calculation of mean values of non-energetic quantities 
O, Eq. (5) is replaced by the general formula 
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In the following, we shall focus ourselves on the study 
of these thermodynamic quantities for different HP lat- 
tice proteins and develop an algorithm that allows a di- 
rect simulation of the density of states. 



IV. MULTICANONICAL CHAIN GROWTH 
ALGORITHM 

Before we describe the idea behind the multicanoni- 
cal chain growth algorithm and the iterative determina- 
tion of the multicanonical weights which are related to 
the density of states, we first recall the canonical chain 
growth variants nPERMf|, on which our algorithm builds 
up. 



A. Canonical Chain Growth 

In the original chain growth algorithm PERM [4], the 
sample of chains with length n < N is enriched by identi- 
cal copies if the weight factor (3) is bigger than a thresh- 
old value W> ■ In order to obey correct Boltzmann statis- 
tics, the weight is divided among the clones. If, however, 
jyPERM < w< f the chain is pruned with, e.g. probability 
1/2, requiring the weight of a surviving chain to be taken 
twice. Then one attaches a new monomer at a randomly 
chosen free next-neighbour site of the previous one. This 
is done for all chains that still exist and the procedure is 
repeated until the total chain length N is reached or the 
growth of a chain was terminated by a dead end or due 
to pruning. After all chains created within this tree have 
grown until their end is reached and thus the present so- 
called tour is finished, a new growth process starts from 
the first monomer, i.e., a new tour begins. Having cre- 
ated an appropriate number of chains with length n, they 
will be canonically distributed at the given temperature 
T. In fact, this is also true for all partial chains with 
intermediate lengths n < N, but there are strong corre- 
lations between chains with different lengths n. 

In the recently proposed new PERM variants 
nPERM?| (new PERM with simple/importance sampling 
(ss/is)) a considerable improvement is achieved by creat- 
ing different copies, i.e., the chains are identical in (n — 1) 
monomers but have different continuations, instead of 
completely identical ones, since identical partial chains 
usually show up a similar evolution. Because of the dif- 
ferent continuations, the weights of the copies can differ. 
Therefore it is not possible to decide about the number 
of copies on the basis of a joint weight. The suggestion 
is to calculate first a predicted weight which is then com- 
pared with the upper threshold W> in order to determine 
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the number of clones. Another improvement of PERM 
being followed up since first applications to lattice pro- 
teins is that the threshold values W> and W£ are no 
longer constants, but are dynamically adapted with re- 
gard to the present estimate for the partition sum and 
to the number of successfully created chains with length 
n. The partition sum is proportional to the sum over the 
weight factors of all conformations of chains with length 
n, created with a Roscnbluth chain growth method like, 
for instance, nPERMf|: 



Z n — 



1 



(Xn,t). 



(13) 



Here, X„ it denotes the tth generated conformation of 
length n. The proportionality constant is the inverse 
of the number of successful tours, i.e., the number of 
chain growth starts M tours that led to the generation of 
at least one chain of length n. Note that due to this 
normalisation it is possible to estimate the degeneracy of 
the energy states. This is in striking contrast to impor- 
tance sampling Monte Carlo methods, where the overall 
constant on the r.h.s. of Eq. (13) cannot be determined 
and hence only relative degeneracies can be estimated. 

Since nPERMss and nPERMis, respectively, are possi- 
ble fundamental ingredients for our algorithm, it is use- 
ful to recall in some detail how these chain growth algo- 
rithms work. The main difference in comparison with the 
original PERM is that, if the sample of chains of length 
n — 1 shall be enriched, the continuations to an unoc- 
cupied next-neighbour site have to be different, i.e., the 
weights of these chains with length n can differ. There- 
fore it is impossible to calculate a uniform weight like 
W^ ERM as given in Eq. (3) before deciding whether to 
enrich, to prune, or simply to continue the current chain 
of length n— 1. As proposed in Ref. [6], it is therefore use- 
ful to control the population on the basis of a predicted 
weight W% Icd which is introduced as 



VF„ prcd = W. 



nPERMf, 1 
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Xa 

Q = l 



(14) 



where m n denotes the number of free neighbouring 
sites to continue with the nth monomer. The "impor- 
tances" X £ PERM - differ for nPERMss and nPERMis. 
Due to its characterisation as a simple sampling algo- 
rithm (nPERMss), where all continuations are equally 
probable, and as a method with importance sampling 
(nPERMis), the importances may be defined as 



nPERMss 



Xc 
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x „PERMis = ( m (a) + 



(15) 
(16) 

The expression for nPERMis involves the energy E ( n a) of 
the choice a € [1, m n ] for placing the nth monomer and 
the number of free neighbours of this choice which 
is identical with m„+i, provided the cuth continuation 



was indeed selected for placing the nth monomer. Since 
^.nPERMis C ontains informations beyond the nth continu- 
ation of the chain, nPERMis controls the further growth 
better than nPERMss. The predicted weight for the nth 
monomer is now used to decide how the growth of the 
chain is continued. If the predicted weight is bigger than 
the current threshold, WP red > W>, and m n > 1, the 
sample of chains is enriched and the number of copies 
k is determined according to the empirical rule k = 
min[m„,int(VKP rod /VK>)]. Thus, 2 < k < m n different 
continuations will be followed up. Using nPERMss, the k 
continuations are chosen randomly with equal probability 
among the m n possibilities, while for nPERMis the prob- 
ability of selecting a certain fc-tuple A = {ct\, . . . , a>k} of 
different continuations is given by 



PA 
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(17) 



Considering the probabilities pa as partial intervals of 
certain length, arranging them successively in the total 
interval [0, 1] (since ^2^Pa = 1), and drawing a random 
number r £ [0,1), one selects the tuple whose interval 
contains r. This tuple of different sites is then chosen to 
continue the chain. The corresponding weights are [6] : 
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nPERMf; 



n, aj 
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nPERMf" 



k 



PA 



(18) 

where j G {1, . . . , fc} is the index of the a^th continu- 
ation within the tuple A. In the special case of sim- 
ple sampling this expression reduces to W^ p J^ RMss = 
w -nPERMss mn exp[-/?(£;i aj ) - E n -i)]/k. If the predicted 
weight is less than the lower threshold, W% rcd < W<, 
however, the growth of this chain is stopped with prob- 
ability 1/2. In this case, one traces the chain back to 
the last branching point, where the growth can be con- 
tinued again, or, if there are no branching points, a new 
tour is started. If the chain survives, the continuation 
of the chain follows the same procedure as described 
above, but now with k = 1, where Eq. (17) simplifies to 
PA = Xa/ E Q Xa since A = {a}. In this case the weight 
of the chain is taken twice. For W< < W^ rcd < W> , the 
chain is continued without enriching or pruning (once 
more with k = 1). 

The first tour, where the nth monomer is attached for 
the first time, is started with bounds set to — oo and 
Wjf = 0, thus avoiding enrichment and pruning. For the 
following tours, we use 



W>=C 
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(19) 



where Z\ = c\ is the partition function of chains with 
length unity and Cj the number of created chains with i 
monomers. The constant C < 1 is some positive number 
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and controls the number of successfully generated chains 
per tour. For the lower bound we use W< = 0.2W>. All 
these choices are in correspondence to Ref. [6] . 



B. Multicanonical Sampling of 
Rosenbluth- Weighted Chains 

The idea behind our multicanonical chain growth 
method is to flatten the canonical energy distribution 
provided by nPERM;|. For a given temperature, the 
latter algorithms yield accurate canonical distributions 
over some orders of magnitude. In order to construct 
the entire density of states, standard reweighting pro- 
cedures may be applied, requiring simulations for dif- 
ferent temperatures [3]. The low-temperature distribu- 
tions are, however, very sensitive against fluctuations of 
weights which inevitably occur because the number of 
energetic states is low, but the weights are high. Thus, 
it is difficult to obtain a correct distribution of energetic 
states, since this requires a reasonable number of hits of 
low-energy states. Therefore we assign the chains an ad- 
ditional weight, the multicanonical weig ht factor V^ at , 
chosen such that all possible energetic states of a chain of 
length n possess almost equal probability of realisation. 
The first advantage is that states having a low Boltzmann 
probability compared to others are hit more frequently. 
Secondly, the multicanonical weights introduced in that 
manner are proportional to the inverse canonical distri- 
bution at temperature T, W% at (E) ~ l/P™ n > T (E), re- 
spective the inverse density of states 



(20) 



for T — > oo. Thus, only one simulation is required and a 
multi-histogram reweighting is not necessary. An impor- 
tant conceptual aspect is the fact that the multicanonical 
weight factors are unknown in the beginning and have to 
be determined iteratively. 

Before we discuss the technical aspects regarding our 
method, we first explain it more formally. The energy- 
dependent multicanonical weights are trivially intro- 
duced into the partition sum (13) as suitable "decom- 
position of unity" in the following way: 



1 



Mt, 



nPERMJ, 



xIF„ flat (£(X„, t )) [W n flat (£(X„, t ))] 1 . (21) 



Since we are going to simulate at infinite temperature, we 
express with (20) the partition sum which then coincides 
with the total number of all possible conformations as 

Z n = -jj— yV(£(X n>t ))W r n (X n , t ) (22) 

J^tours t 

with the combined weight 

W„(Xn,t) = lK PERML8 (X„ ;t X at ( J B(X n , t )). (23) 



Taking this as the probability for generating chains of 
length n, p n ~ W n , leads to the desired flat distribution 
H n (E), from which the density of states is obtained by 



9n(E) 



HnjE) 



(24) 



The canonical distribution at any temperature T is calcu- 
lated by simply reweighting the density of states to this 
temperature, P™ n ' T (E) ~ g n (E) exp (-E/T). 



C. Iterative Determination of the Density of States 

In the following, we describe our procedure for the iter- 
ative determination of the multicanonical weights, from 
which we obtain an estimate for the density of states. 
Since there are no informations about an appropriate 
choice for the multicanonical weights in the beginning, 
we set them in the zeroeth iteration for all chains 2 < 
n < N and energies E equal to unity, W^ at '^°' (E) = 1, 
and the histograms to be flattened are initialised with 
Hn\E) = 0. These assumptions render the zeroeth it- 
eration a pure nPERM?| run. 

Since we set (3 = from the beginning, the accumu- 
lated histogram of all generated chains of length n, 



H^(E)=J2w:^S EtE 



(25) 



is a first estimate of the density of states. In order to 
obtain a flat histogram in the next iteration, we update 
the multicanonical weights 



W^ {1) (E) 



W, 



flat,(0) 



(E) 



Hk°\E) 



Vn,E 



(26) 



and reset the histogram, (E) 



0. 



The first and all following iterations are multicanon- 
ical chain growth runs and proceed along similar lines 
as described above, with some modifications. The pre- 
diction for the new weight follows again (14), but the 
importances Xa (15) are in the ith iteration introduced 
as 
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W, 



flat,(i) 
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(27) 



In the simple sampling case, we still have x«' W = 1. If 
the sample is enriched (W^ lcd > W>) the weight (18) of 
a chain with length n choosing the <x,th continuation is 
now replaced by 
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(28) 



where in the simple sampling case (ss) pa and the bino- 
mial factor again cancel each other. If W£ < W^ rcd < 
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an nth possible continuation is chosen (selected as 
described for the enrichment case, but with k = 1) and 
the weight is as in Eq. (28). Assuming that W% rcd < W< 
and that the chain has survived pruning (as usual with 
probability 1/2), we proceed as in the latter case and the 
chain is assigned twice that weight. The upper threshold 
value is now determined in analogy to Eq. (19) via 
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where 



(29) 



(30) 



is the estimated partition sum according to the new dis- 
tribution provided by the weights (28) for chains with n 
monomers and Zf at = Z\ = c\. Whenever a new itera- 
tion is started, Z^ at , c„, W< are reset to zero, and W> 
to infinity (i.e. to the upper limit of the data type used 
to store this quantity). If a chain of length n with the 
energy E was created, the histogram is increased by its 
weight: 



HW(E) 



'Et E ■ 



(31) 



From iteration to iteration, this histogram approaches 
the desired flat distribution H n (E) and after the final 
iteration i = I, the density of states is estimated by 



Hi I] {E) 



W, 



flat, (7) 



(E) 



2<n<N, 



(32) 



in analogy to Eq. (24). 

In our simulations, we usually performed up to 30 
iterations. The runs to / — 1 were usually termi- 
nated after 10 5 -10 6 chains of total length N had been 
produced, while in the measuring run (i = I) usually 
10 7 -10 9 conformations are sufficient to obtain reasonable 
statistics. The parameter C in Eq. (29) that controls 
the pruning/enrichment statistics and thus how many 
chains of complete length N are generated per tour, 
was set to C = 0.01, such that on average 10 complete 
chains were successfully constructed within each tour. 
With this choice, the probability for pruning the cur- 
rent chain or enriching the sample was about 20%. In 
almost all started tours M at least one chain achieved 
its complete length. Thus the ratio between started and 
successfully finished tours M succ is approximately unity, 
M succ /M « 1, assuring that our algorithm performed 
with quite good efficiency 

Unlike typical applications of multicanonical or flat 
histogram algorithms in importance sampling schemes, 
where all energetic states become equally probable such 
that the dynamics of the simulation corresponds to a ran- 
dom walk in energy space, the distribution to be flattened 
in our case is the histogram that accumulates the weights 



of the conformations. Hence, if the histogram is flat, a 
small number of high-weighted conformations with low 
energy E has the same probability as a large number of 
appropriate conformations with energy E' > E carrying 
usually lower weights. Therefore the number of actual 
low-energy hits remains lower than the number of hits of 
states with high energy. In order to accumulate enough 
statistics in the low-energy region, the comparative large 
number of generated conformations in the measuring run 
is required. 

We have also implemented multicanonical chain 
growth simulations, where we were going to flatten the 
"naked" energy distribution, i.e., we tried to equalise 
the number of hits for all energetic states. The prob- 
lem is that this contradicts the philosophy of Roscnbluth 
chain growth methods, where the bias connected with 
the Roscnbluth weight controls the population of sam- 
ples. Therefore lowest-energy states were not "tuned" by 
this bias and not hit accordingly. For applications with- 
out special focus to the low-temperature region, it may 
be, however, an appropriate alternative to the procedure 
described above and should be pursued further. 



V. VALIDATION AND PERFORMANCE 

Before we discuss the physical results obtained with 
the multicanonical chain growth algorithm, we first re- 
mark on tests validating the method. We compared 
the specific heat for very short chains with data from 
exact enumeration and found that our method repro- 
duces the exact results with high accuracy. For a chain 
with 42 monomers, where exact results are not avail- 
able, we performed a multi-histogram rewcighting [13] 
from canonical distributions at different temperatures ob- 
tained from original nPERMis runs. Here, it turned out 
that our method shows up a considerably higher per- 
formance (higher accuracy in spite of lower statistics at 
comparable CPU times). We also compared with im- 
plementations based on sophisticated importance sam- 
pling Monte Carlo schemes, e.g., we have also performed 
multicanonical sampling [1, 2] and Wang-Landau simu- 
lations [14] in combination with conformational updates 
different from chain growth (e.g. move sets as described 
in Section II). For the present applications, however, all 
of these attempts proved to be less efficient. 



Comparison with Results from Exact 
Enumeration 



As a first validation of our method, we apply it to 
a set of 14mers with some interesting properties (see 
Table I) regarding the relation between their ground- 
state degeneracy and the strength of the low-temperature 
conformational transition between lowest-energy states 
and compact globules [8, 15]. In finite-size systems, 
(pseudo-)transitions are usually identified through struc- 
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TABLE I: Sequences, hydrophobicity uh, and global mini- 
mum energy i? m i n with degeneracy pg x (without rotations, re- 
flections, and translations) of the exactly enumerated 14mers 
used for validation of our algorithm. The last column contains 
the predictions for the ground-state degeneracy obtained with 
our method. 



No. sequence hh 
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14.1 HPHPH 2 PHPH 2 P 2 H 8 


-8 1 
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0.03 


14.2 H2P2HPHPH2PHPH 8 


-8 2 
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0.07 


14.3 H2PHPHP2HPHPH2 8 


-8 2 


2.00 


± 


0.06 


14.4 H 2 PHP 2 HPHPH 2 PH 8 
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3.99 
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0.13 
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FIG. 1: Logarithmic plot of the relative errors of our estimates 
for the specific heats of the 14mers given in Table I. 

tural peculiarities (maxima for strong transitions or 
"shoulders" for weak transitions) in the temperature- 
dependent behaviour of fluctuations of thermodynamic 
quantities. Usually, it is hard to obtain a quite accurate 
estimation of the fluctuations in the low-temperature re- 
gion. Thus it is a good test of our method to calculate 
fluctuating quantities for the 14mers listed in Table 1 
and to compare with results that are still available by 
exactly enumerating all possible 943 974 510 conforma- 
tions (except translations) [15]. Therefore we determined 
with our method the densities of states for these 14mers 
and calculated the fluctuations of the energy around the 
mean value in order to obtain the specific heat accord- 
ing to Eq. (7). We generated 10 9 chains and the re- 
sults for the specific heats turned out to be highly ac- 
curate. This is demonstrated in Fig. 1, where we have 
plotted for the exemplified Miners the relative errors 
e{T) = \C^{T)-C V (T)\/C^{T) of our estimates CV(T) 
compared with the specific heats Cy x (T) obtained by the 
exact enumeration procedure. We see that, except for 
very low temperatures, the relative error is uniformly 
smaller than 1CP 3 



B. Multiple Histogram Reweighting 

The calculation of the density of states by means of 
canonical stochastic algorithms cannot be achieved by 



simply reweighting one canonical histogram, obtained for 
a given temperature, to as many as necessary distribu- 
tions to cover the whole temperature region, since the 
overlap between the sampled distribution and most of 
the reweighted histograms is too small [3]. As this sim- 
ple reweighting only works in a certain region around 
the temperature the simulation was performed, a mul- 
tiple application of the reweighting procedure at differ- 
ent sampling temperatures is necessary [13]. For the se- 
quence of a 42mer to be studied in detail in Section VI A, 
we performed the multiple reweighting of 5 overlapping 
histograms obtained by separate nPERMis runs at tem- 
peratures 0.3, 0.5, 0.8, 1.5, and 3.0 in order to estimate 
the density of states. The histograms as well as the re- 
sulting density of states are shown in Fig. 2, where we 
have also plotted the density of states being obtained by 
means of our multicanonical sampling algorithm. Each 
of the histograms contains statistics of 8 x 10 7 chains. 
This number was adequately chosen such that the den- 
sity of states from histogram reweighting matches within 
the error bars of the density of states obtained with our 
algorithm that also inherently supplies us with the ab- 
solute density of states. Note that these absolute values 
cannot be obtained by means of the multiple-histogram 
reweighting procedure, where the normalisation is ini- 
tially arbitrary. Our density of states was obtained by 
accumulating statistics of 5 x 10 7 chains. This means 
that 8 times more chains were necessary to approxi- 
mately achieve the accuracy with the multiple histogram 
reweighting method. The iterative period for the deter- 
mination of the multicanonical weights is no drawback, 
as it takes in our implementation only 10% compared to 
the production run. Therefore we conclude that our dy- 
namical method is more efficient and also more elegant 
than a static reweighting scheme, where also a reliable 
estimation of statistical errors is extremely cumbersome. 



Iogio9(£), 40 
\og w H T (E) 



1 

r = o.s 


1 


1 1 


1 1 


0.5 






\ log 10 H T (E) 


0.8 








1.5 - 


3.0 










logio 9(E) 




l 


l 


l l 


l l 



-35 -30 -25 -20 -15 -10 -5 

E 

FIG. 2: Histograms H T (E) obtained by single nPERMis 
runs for 5 different temperatures T = 0.3, 0.5, 0.8, 1.5, and 
3.0 (dashed lines). The resulting density of states g(E) ob- 
tained by multiple histogram reweighting (long dashed line) 
lies within the error bars of the density of states calculated 
by means of our method (solid line). 
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C. Flat-Histogram Algorithms with Update 
Mechanisms Different from Chain Growth 

The calculation of the density of states for heteropoly- 
mers with less than 40 monomers does not represent a 
big challenge. It is still possible to combine generalised 
ensemble methods like multicanonical sampling [1, 2] or 
the Wang-Landau method [14] with move sets includ- 
ing pivot rotations to yield reasonable statistics in the 
low-energy sector. In order to avoid the sophisticated 
implementation of the move sets, an alternative method 
can be used, for instance, where the conformational in- 
formation is encoded in a string of letters denoting the 
directions F(orward), B(ackward), R(ight), L(eft), U(p), 
and D(own) the walker may follow in an embedded coor- 
dinate system. This structural sequence is then updated 
by simply changing a letter. All these methods require a 
time-consuming self-avoidance check following each up- 
date. We have tested these combination of sampling and 
update methods for 14mers, where we could compare 
with exact results from enumeration, and applied it to a 
30mer with 20 energy states with rather high degenera- 
cies. All these states were frequently hit such that the re- 
sults were reasonable for all temperatures. Remarkably, 
it turned out, however, that the sampling of low-energy 
states becomes more problematic the lower the degen- 
eracy of these states is. Either the algorithm got stuck 
after hitting such a state, or it took a long time to find it 
for the first time. These were indications for a "hidden" 
conformational barrier that could not be circumvented 
with these procedures. 

Applying these methods to sequences with more than 
40 monomers did not yield reliable results. Low-energy 
states were too rarely or never hit in long-term simula- 
tions. Performing a biased simulation by explicitly start- 
ing from a state with lowest energy, i.e., initialising with 
a very dense conformation it took much too long until a 
new self-avoiding conformation was found and accepted. 
Comparing this with applications of the multicanonical 
chain growth method to these examples led us to the con- 
clusion that, in the application to lattice proteins, chain 
growth methods are much more capable of avoiding such 
barriers. 



VI. RESULTS 

In the following we focus on results which we ob- 
tained with the multicanonical chain growth algorithm 
for hctcropolymers with HP sequences of more than 40 
monomers. 



A. Lattice Model for Parallel (3 Helix with 42 
Monomers 

We consider a 42mer with the sequence PH2PHP- 
H2PHPHP2H3PHPH2PHPH3P2HPHPH2PHPH2P that 
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FIG. 3: Estimates for the density of states g^(E) for the 
42mer after different levels of iteration. Since the curves 
would fall on top of each other, we have added, for better 
distinction, a suitable offset to the curves of the 1th, 6th, and 
9th run. The estimate of the 0th run is normalised to unity. 
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FIG. 4: Mean end-to-end distance (R cc ) and mean radius of 
gyration (-Rgyr) of the 42mer. 

forms a parallel helix in the ground state. It was designed 
to serve as a lattice model of the parallel /3 helix of pec- 
tate lyase C [16]. But there are additional properties that 
make it an interesting and challenging system to which 
we want to apply our method first. The ground-state 
energy is known to be E m i n = —34. Moreover, the spe- 
cific heat has a very pronounced low-temperature peak 
that indicates a (pseudo-)transition between the lowest- 
energy states possessing compact hydrophobic cores and 
the regime of the globule conformations. This transition 
is in addition to the usual one between globules and ran- 
dom coils. 

Figure 3 shows how the estimate for the density of 
states of this 42mer evolves with increasing number of 
iterations. The 0th iteration is the initial pure nPERMis 
run at (3 = 0. This does not render, however, a proper 
image of the abilities of nPERMis which works much bet- 
ter at finite temperatures. Iterations 1 to 8 are used to 
determine the multicanonical weights over the entire en- 
ergy space E 6 [—34,0]. Then, the 9th iteration is the 
measuring run which gives a very accurate estimate for 
the density of states covering about 25 orders of magni- 
tude. Our estimate for the ground-state degeneracy is 
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FIG. 5: Specific heat Cv and derivatives w.r.t. temperature of 
mean end-to-end distance (-R ee ) and radius of gyration (R sy r) 
as functions of temperature for the 42mer. The ground-state 



- globule transition occurs between T (1) « 0.24 and T c 
0.28, while the globule - random coil transition takes place 



i(2) 



between T^ 1} « 0.53 and T^> « 0.70 (shaded areas) 
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FIG. 6: Temperature dependence of the fluctuations of energy 
<j%, end-to-end distance a 2 e , and radius of gyration <7g yr for 
the 42mer. Note the different scales for a^ a an d o"g yr - in 
the temperature interval plotted the variance of the radius of 
gyration (right scale) is much smaller than the variance of the 
end-to-end distance (left scale), a^ yl < a^ c . 



go = 3.9 ± 0.4, which is in perfect agreement with the 
known value <7q x = 4 (except translational, rotational, 
and reflection symmetries) [17]. 

The average structural properties at finite tempera- 
tures can be best characterised by the mean end-to- 
end distance (R CC )(T) and the mean radius of gyration 
(i? gyr )(T). As these quantities carry shape informations, 
their calculation is not exclusively based on the density 
of states <7iV"(i£(Xjv,t) and hence Eq. (5) cannot be ap- 
plied. Therefore expectation values of such quantities O 
are obtained from the time series of the measuring run 
of the multicanonical chain growth simulation at infinite 
temperature by using the general formula 



(0)(T) = J-£o(X* lt ) 



Zn 

X^(X 7V , t )<?Ar(^(X A r :t ))e-' 3B(X "' f 

with the estimate for the partition sum Zn 
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FIG. 7: Canonical distributions for the 42mer at temperatures 
(a) T = 0.24, 0.25, . . . , 0.30 close to the ground-state - globule 
transition region between T (1) w 0.24 and T (2) ps 0.28, (b) 
T — 0.50, 0.55, 1.0. The high-temperature peak of the 
specific heat in Fig. 5 is near « 0.53, but at t[ 2) t* 0.73 
the distribution has the largest width, cf. Fig. 6. Near this 
temperature, the mean radius of gyration and the mean end- 
to-end distance (see Figs. 4 and 5) have their biggest slope. 



Z t W N (X Ntt )g N (E{X N , t ))ex.p{-!3E(X Ntt )}. Our re- 
suits for (R ee )(T) and (i? gyr )(T) of the 42mer are shown 
in Fig. 4. Here and in the following figures, the statisti- 
cal errors were estimated by using the jackknifc binning 
method [18]. The pronounced minimum in the end-to- 
end distance can be interpreted as an indication of the 
transition between the lowest-energy states and globules: 
The low number of ground states have similar and highly 
symmetric shapes (due to the reflection symmetry of the 
sequence) but the ends of the chain are polar and there- 
fore they are not required to reside close to each other. 
Increasing the temperature allows the protein to fold into 
conformations different from the ground states and con- 
tacts between the ends become more likely. Therefore, 
the mean end-to-end distance decreases and the protein 
has entered the globule "phase". Further increasing the 
temperature leads then to a disentangling of the globules 
and random coil conformations with larger end-to-end 
distances dominate. From Fig. 5, where we have plotted 
the specific heat and the derivatives of the mean end- 
to-end distance and of the mean radius of gyration with 
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FIG. 8: Logarithmic plots of the densities of states g(E) and 
"fiat" histograms H(E) for the sequences 48.1, 48.5, 48.6, 
and 48.7 from Table II that have the same lowest energy 
—32. The normalisation of the histograms of these 
examples was chosen such that they coincide at maximum 
energy, log 10 -ff(-E max = 0) = 1. 

respect to the temperature, 

-^{Ree)(T) = ^((ER ee )-(E)(R ee )), (34) 
-^(R syr )(T) = -L({ER gyr )-(E)(R gyr )), (35) 

we estimate the temperature region of the ground-state - 
globule transition to be within Tq 1 -* ps 0.24 and « 
0.28. The globule - random coil transition takes place 
between T\ (1) « 0.53 and t[ 2) « 0.70. 

In Fig. 6 we show variances <7q = (O 2 ) — (O) 2 as 
functions of temperature for O = E, R ee , and i? gyr - 
We observe weak "shoulders" around T = 0.3 (to see 
this for a 2 would require, however, an even higher res- 
olution of the plot), close to the interval, where the 
low-temperature transition is expected. The situation 
is much more diffuse in the temperature region, where 
the globule - random coil transition should take place. 
The variance of the energy a 2 E has a peak at T = 0.73, 
near the temperatures of the corresponding peaks of the 
derivatives (34) and (35) plotted in Fig. 5. The variances 
of the end-to-end distance a 2 e and the radius of gyra- 
tion cTg yr , however, do not exhibit at all a peak near this 
temperature. Obviously, there is no unique behaviour of 
these quantities which are usually used to identify confor- 
mational transitions in this temperature region. In Fig. 7 
we have plotted the canonical distributions P%2 n ' T (E) for 
different temperatures in the vicinity of the two transi- 
tions. From Fig. 7(a) we read off that the distributions 
possess two peaks at temperatures within that region 
where the ground-state - globule transition takes place. 
This is interpreted as indication of a "first-ordcr-likc" 
transition [19]. The behaviour in the vicinity of the glob- 
ule - random coil transition is less spectacular as can be 
seen in Fig. 7(b), and since the energy distribution shows 
up one peak only, this transition could be denoted as be- 
ing "second-order-like". The width of the distributions 



FIG. 9: Mean energy (E)(T), Helmholtz free energy F(T), 
and entropy S(T) for 48mers with the sequences given in Ta- 
ble II. 

grows with increasing temperature until it has reached its 
maximum value which is located near T w 0.7, cf. Fig. 6. 
For higher temperatures, the distributions become nar- 
rower again. 

Since finite-size scaling is impossible because of 
the non-continuable sequences of different types of 
monomers, "transitions" between classes of protein 
shapes are, of course, to be distinguished from phase 
transitions in the strict thermodynamic sense. In con- 
clusion, conformational transitions for polymers of finite 
size, such as proteins, are usually weak and therefore 
difficult to identify. Thus, these considerations are, of 
course, of limited thermodynamic significance. From a 
technical point of view, however, it is of some importance 
as Markovian Monte Carlo algorithms can show up prob- 
lems with sampling the entire energy space, as the proba- 
bility in the gap between the two peaks can be suppressed 
by many orders of magnitude (what is obviously not the 
case in our example of the 42mer) and tunnelings are 
extremely rare. Just for such situations, flat histogram 
algorithms have primarily been developed [1, 2]. 

B. Ten Designed 48mers 

We have also analysed the ten designed sequences with 
48 monomers given in Ref. [20]. The ratio between the 
numbers of hydrophobic and polar residues is one half for 
these HP proteins, i.e., the hydrophobicity is hr = 24. 
In Table II we have listed the sequences and ground- 
state properties. The minimum energies we found coin- 
cide with the values given in Refs. [6, 7, 20]. Figure 8 
shows the densities of states for selected 48mers and the 
multicanonical histograms of the production run. Note 
that for Rosenbluth chain growth methods (a-thermal 
or at (3 = 0) the histogram for chains of length N is 
obtained by accumulating their individual Rosenbluth 
weights W]$, which explains the poorer performance near 
the minimum energy, where a small number of states en- 
ters with big weights. This differs from the usual pro- 
cedure in algorithms with importance sampling, where 
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FIG. 10: Heat capacities CV(T), mean end-to-end distances {R e 
48mers from Table II. 



,)(T), and mean radii of gyration (R gyr )(T) of the ten designed 
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TABLE II: Properties of the 48mers. For each of the sequences we have listed the ground-state energy E mln and the ground-state 
degeneracy g estimated with our algorithm. For comparison, we have also quoted the lower bounds on native degeneracies 
5chcc obtained by means of the CHCC (constrained-based hydrophobic core construction) method [21] as given in Ref. [20]. 
In both cases the constant factor 48 from rotational and reflection symmetries of conformations spreading into all three spatial 
directions was divided out. 



No. 


sequence 


F, ■ 


3o (xlO 3 ) 


ffCHCC ( X 


48.1 


HPH2P2H 4 PH3P2H2P2HPH3PHPH2P2H 2 P3HP 8 H2 


-32 


5226 ± 812 


1500 


48.2 


H4PH2PH5P2HP2H2P2HP6HP2HP3HP2H2P2H3PH 


-34 


17 ± 8 


14 


48.3 


PHPH2PH6P2HPHP2HPH2PHPHP3HP2H2P2H2P2HPHP2HP 


-34 


6.6 ± 2.8 


5.0 


48.4 


PHPH2P2HPH3P2H2PH2P3H5P2HPH2PHPHP4HP2HPHP 


-33 


60 ± 13 


62 


48.5 


P2HP3HPH 4 P2H 4 PH2PH3P2HPHPHP2HP 6 H2PH2PH 


-32 


1200 ± 332 


54 


48.6 


H3P3H2PHPH2PH2PH2PHP7HPHP2HP3HP2H6PH 


-32 


96 ± 19 


52 


48.7 


PHP4HPH3PHPH4PH2PH2P3HPHP3H3P2H2P2H2P3H 


-32 


58 ± 21 


59 


48.8 


PH2PH3PH4P2H3P6HPH2P2H2PHP3H2PHPHPH2P3 


-31 


22201 ± 6594 


306 


48.9 


PHPHP 4 HPHPHP2HPH 6 P2H3PHP2HPH2P2HPH 3 P4H 


-34 


1.4 ± 0.5 


1.0 


48.10 PH2P6H2P3H3PHP 2 HPH2P2HP2HP2H2P2H 7 P2H 2 


-33 


187 ± 87 


188 



the counter of an energy bin being hit by an appropriate 
state is incremented by unity. 

In Fig. 9 we have plotted the mean energy, free en- 
ergy, and entropy as functions of temperature for these 
lattice proteins. These results were obtained by means 
of the density of states as sampled with our algorithm. 
For T —> the curves for both (E) and F merge into 
the four different values of E m i n (= —34, —33, —32, 
—31) while the entropy exhibits the ground-state de- 
generacies, S — ► In go- Our estimates for the degen- 
eracies go of the ground-state energies, and for compar- 
ison, the lower bounds 5chcc gi verL m R°f- [20], arc 
listed in Table II. The lower bounds were obtained 
with the constrained-based hydrophobic core construc- 
tion (CHCC) method [21]. Our values lie indeed above 
these lower bounds or include it within the range of sta- 
tistical errors. Notice that for the sequences 48.1, 48.5, 
and 48.8, our estimates for the ground-state degeneracy 
are much higher than the bounds <?c H cc- ^ n tncse cases 
the smallest frame containing the entire hydrophobic core 
is rather large (cube containing 4 x 3 x 3 = 36 monomers 
with surface area A = 32 [bond length] 2 ) such that enu- 
meration of this frame is cumbersome. For 48.5 and 48.8, 
we further found ground-state conformations lying in less 
compact frames (48.5: A = 32,40,42,48,52,54 [bond 
length] 2 , 48.8: A = 32,40,42 [bond length] 2 ) and those 
conformations would require still more effort to be identi- 
fied with the CHCC algorithm, which was designed to lo- 
cate global energy minima and therefore starts the search 
beginning from the most compact hydrophobic frames. 
The ground-state energies of these examples are rather 
high (E min = -31 for 48.8, and E min = -32 for 48.1 
and 48.5) and therefore a higher degeneracy seems to be 
natural. This is, however, only true, if there does not 
exist a conformational barrier that separates the com- 
pact H-core low-energy states from the general compact 
globules. Comparing the ground-state degeneracies and 
the low-temperature behaviour of the specific heats for 



the sequences 48.1, 48.5, 48.6, and 48.7 (all of them hav- 
ing global energy minima with i? m i n = —32) as shown in 
Figs. 8 and 10, respectively, we observe that 48.6 and 48.7 
with rather low ground-state degeneracy actually possess 
a pronounced low-temperature peak in the specific heat, 
while the higher-degenerate proteins 48.1 and 48.5 only 
show up a weak indication of a structural transition at 
low temperatures. The HP proteins 48.2, 48.3, and 48.9, 
which have the lowest minimum energy E m i n = — 34 
among the examples in Table II, have also the lowest 
ground-state degeneracies. These three candidates seem 
indeed to exhibit a rather strong ground-state - globule 
transition, as can be read off from the associated specific 
heats in Fig. 10. 

We have again measured the mean end-to-end dis- 
tances and mean radii of gyration which are also plotted 
as functions of temperature into Fig. 10. Both quanti- 
ties usually serve to interpret the conformational com- 
pactness of polymers. For HP proteins, the end-to-end 
distance is strongly influenced, however, by the types of 
monomers attached to the ends of the chain. It is eas- 
ily seen from the figures that the 48mers with sequences 
starting and ending with a hydrophobic residue (48.1, 
48.2, and 48.6) have a smaller mean end-to-end distance 
at low temperatures than the other examples from Ta- 
ble II. The reason is that the ends can form hydrophobic 
contacts and therefore a reduction of the energy can be 
achieved. Thus, in these cases contacts between ends are 
usually favourable and the mean end-to-end distance is 
close to the mean radius of gyration. Interestingly, there 
exists indeed a crossover region, where (R ee ) < (R gy i)- 
Comparing with the behaviour of the specific heat, this 
interval is close to the region, where the phase domi- 
nated by low-energy states crosses over to the globule- 
favoured phase. The hydrophobic contact between the 
ends is strong enough to resist the thermal fluctuations 
in that temperature interval. The reason is that, once 
such a hydrophobic contact between the ends is estab- 
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lished, usually other in-chain hydrophobic monomers are 
attracted and form a hydrophobic core surrounding the 
end-to-end contact. Thus, before the contact between 
the ends is broken, an increase of the temperature first 
leads to a melting of the surrounding contacts. The en- 
tropic freedom to form new conformations is large since 
the low-energy states are all relatively high degenerate 
and do not possess symmetries requiring an appropriate 
amount of heat to be broken. For sequences possessing 
mixed or purely polar ends, the mean end-to-end dis- 
tance and mean radius of gyration differ much stronger, 
as there is no energetic reason, why the ends should be 
next neighbours. 

In conclusion, we see that for longer chains the strength 
of the low-temperature transition not only depends 
on low ground-state degeneracies as it does for short 
chains [15]. Rather, the influence of the higher-excited 
states cannot be neglected. A striking example is se- 
quence 48.4 with rather low ground-state degeneracy, but 
only weak signals for a low-temperature transition. 



C. Beyond 100 Monomers ... 

The final example we applied our algorithm to was 
a 103mer as proposed in Ref. [22]. Until recently, the 
"ground state" was believed to have energy E min = 
—49 [23]. The up-to-now best estimate was found with 
nPERMis to be E m i n = —54 [6] and, with an additional 
bias suppressing contacts between H and P monomers, 
even E m { n — — 55 [7]. Our algorithm not only decreased 
the lowest-energy value to E min = — 56 (see Fig. 11), but 
also enabled us to obtain results for the thermodynamic 
quantities as in the previous examples. Figure 12 shows 
the density of states which covers more than 50 orders of 
magnitude from which we determined the specific heat 
shown in Fig. 13. The degeneracy of the lowest-energy 
state -Emin = —56 was determined to be of order 10 16 
such that it seems likely that there exist still one or more 
even lower-lying energetic states. 





FIG. 12: Density of states for the 103mer, ranging over more 
than 50 orders of magnitude. 



VII. SUMMARY 

In order to study heteropolymers at very low temper- 
atures with reasonable accuracy, we developed a multi- 
canonical chain growth algorithm based on recently im- 
proved variants of PERM. Comparing with exact enu- 
meration data for HP lattice proteins with 14 monomers, 
we validated that our method is suitable to accurately de- 
termine thermodynamic quantities for all temperatures. 
Then, we applied this algorithm to lattice proteins with 
sequences of more than 40 monomers known from lit- 
erature. Additionally, we determined statistical proper- 
ties for all temperatures for examples with up to 103 
monomers. Since our algorithm allows the estimation of 
the degeneracy of the energy states, wc determined for 
all sequences the ground-state degeneracy as it is an in- 
dication for the "uniqueness" of the native state. 

In particular, we presented a detailed investigation of a 
sequence of 42 monomers that has interesting character- 
istic properties, e.g., a quite low ground-state degeneracy. 
Since some results regarding the ground states and ther- 
modynamic properties were available [19] it was a good 
candidate for testing our algorithm and for checking the 
performance of our method. For this sequence we anal- 
ysed in detail the temperature-dependent behaviour of 
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FIG. 11: Conformation of the 103mer with the lowest energy 
found, -E m in = —56. 



FIG. 13: Specific heat of the 103mer. 
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radius of gyration, end-to-end distance, as well as their 
fluctuations, and compared it with the specific heat in 
order to elaborate relations between characteristic prop- 
erties of these curves (peaks, "shoulders") and conforma- 
tional transitions not being transitions in a strict ther- 
modynamic sense due to the impossibility to formulate a 
thermodynamic limit for proteins. Therefore, we identi- 
fied temperature regions, where global changes of protein 
conformations occur. 

Furthermore, we studied energetic and conformational 
thermodynamic quantities for the famous list of ten 
48mcrs given in Ref. [20] in great detail. These are nice 
examples as they allow for the comparison of lattice pro- 
teins with similar properties (same number of monomers, 
identical hydrophobicity) but sequences that differ in the 



order of hydrophobic and polar monomers. This also 
allowed us to study how conformational properties and 
the strength of shape transitions depend on the protein 
sequence. As an interesting by-product, we not only 
confirmed the known global-minimum energies for these 
examples, but we even found a new minimum for the 
103mer being the longest sequence under consideration. 
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